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Abstract — Wideband analog signals push contemporary 
analog-to-digital conversion systems to their performance limits. 
In many applications, however, sampling at the Nyquist rate 
is inefficient because the signals of interest contain only a 
small number of significant frequencies relative to the ban- 
dlimit, although the locations of the frequencies may not be 
known a priori. For this type of sparse signal, other sampling 
strategies are possible. This paper describes a new type of 
data acquisition system, called a random demodulator, that is 
constructed from robust, readily available components. Let K 
denote the total number of frequencies in the signal, and let 
W denote its bandlimit in Hz. Simulations suggest that the 
random demodulator requires just 0{K \og(W / K)) samples per 
second to stably reconstruct the signal. This sampling rate is 
exponentially lower than the Nyquist rate of W Hz. In contrast 
with Nyquist sampling, one must use nonlinear methods, such 
as convex programming, to recover the signal from the samples 
taken by the random demodulator. This paper provides a detailed 
theoretical analysis of the system's performance that supports the 
empirical observations. 

Index Terms — analog-to-digital conversion, compressive sam- 
pling, sampling theory, signal recovery, sparse approximation 
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I. Introduction 

THE Shannon sampling theorem is one of the founda- 
tions of modern signal processing. For a continuous-time 
signal / whose highest frequency is less than W/2 Hz, the 
theorem suggests that we sample the signal uniformly at a 
rate of W Hz. The values of the signal at intermediate points 
in time are determined completely by the cardinal series 



f{t)-Pcit) 



(1) 



In practice, one typically samples the signal at a somewhat 
higher rate and reconstructs with a kernel that decays faster 
than the sine function [1, Ch. 4]. 

This well-known approach becomes impractical when the 
bandlimit W is large because it is challenging to build sam- 
pling hardware that operates at a sufficient rate. The demands 
of many modern applications already exceed the capabilities 
of current technology. Even though recent developments in 

Submitted: 30 January 2009. Revised: 12 September 2009. A preliminary 
report on this work was presented by tlie first author at SampTA 2007 in 
Thessaloniki. 

JAT was supported by ONR N00014-08-1-0883, DARPA/ONR N66001- 
06-1-2011 and N6600 1-08- 1-2065, and NSF DMS-0503299. JNL, MFD, and 
RGB were supported by DARPA/ONR N6600 1-06-1 -20 1 1 and N66001-08- 
1-2065, ONR N00014-07-1-0936, AFOSR FA9550-04-1-0148, NSF CCF- 
0431150, and the Texas Instruments Leadership University Program. JR was 
supported by NSF CCF-515632. 




Fig. 1 . Block diagram for the random demodulator The components include 
a random number generator, a mixer, an accumulator, and a sampler. 



analog-to-digital converter (ADC) technologies have increased 
sampling speeds, state-of-the-art architectures are not yet ad- 
equate for emerging applications, such as ultrawideband and 
radar systems because of the additional requirements on power 
consumption [2]. The time has come to explore alternative 
techniques [3]. 

A. The Random Demodulator 

In the absence of extra information, Nyquist-rate sampling 
is essentially optimal for bandlimited signals [4]. Therefore, 
we must identify other properties that can provide additional 
leverage. Fortunately, in many applications, signals are also 
sparse. That is, the number of significant frequency compo- 
nents is often much smaller than the bandlimit allows. We can 
exploit this fact to design new kinds of sampling hardware. 

This paper studies the performance of a new type of 
sampling system — called a random demodulator — that can be 
used to acquire sparse, bandlimited signals. Figure [T] displays 
a block diagram for the system, and Figure [2] describes the 
intuition behind the design. In summary, we demodulate the 
signal by multiplying it with a high-rate pseudonoise sequence, 
which smears the tones across the entire spectrum. Then 
we apply a lowpass anti-aliasing filter, and we capture the 
signal by sampling it at a relatively low rate. As illustrated 
in Figure |3] the demodulation process ensures that each tone 
has a distinct signature within the passband of the filter Since 
there are few tones present, it is possible to identify the tones 
and their amplitudes from the low-rate samples. 

The major advantage of the random demodulator is that 
it bypasses the need for a high-rate ADC. Demodulation is 
typically much easier to implement than sampling, yet it 
allows us to use a low-rate ADC. As a result, the system 
can be constructed from robust, low-power, readily available 
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Fig. 2. Action of the demodulator on a pure tone. The demodulation process 
multiplies the continuous-time input signal by a random square wave. The 
action of the system on a single tone is illustrated in the time domain (left) 
and the frequency domain (right). The dashed line indicates the frequency 
response of the lowpass filter. See Figure [3] for an enlargement of the filter's 
passband. 




Fig. 3. Signatures of two different tones. The random demodulator furnishes 
each frequency with a unique signature that can be discerned by examining 
the passband of the antialiasing filter. This image enlarges the pass region 
of the demodulator's output for two input tones (solid and dashed). The two 
signatures are nearly orthogonal when their phases are taken into account. 



components even while it can acquire higher-bandlimit signals 
than traditional sampling hardware. 

We do pay a price for the slower sampling rate: It is no 
longer possible to express the original signal / as a linear 
function of the samples, a la the cardinal series ([T]). Rather, / is 
encoded into the measurements in a more subtle manner. The 
reconstruction process is highly nonlinear, and must carefully 
take advantage of the fact that the signal is sparse. As a result, 
signal recovery becomes more computationally intensive. In 
short, the random demodulator uses additional digital pro- 
cessing to reduce the burden on the analog hardware. This 
tradeoff seems acceptable, as advances in digital computing 
have outpaced those in analog-to-digital conversion. 

B. Results 

Our simulations provide striking evidence that the ran- 
dom demodulator performs. Consider a periodic signal with 
a bandlimit of W/2 Hz, and suppose that it contains K 
tones with random frequencies and phases. Our experiments 
below show that, with high probability, the system acquires 
enough information to reconstruct the signal after sampling 
at just 0{Klog{W/K)) Hz. In words, the sampling rate is 
proportional to the number K of tones and the logarithm of 
the bandwidth W. In contrast, the usual approach requires 



sampling at W Hz, regardless of K. In other words, the ran- 
dom demodulator operates at an exponentially slower sampling 
rate! We also demonstrate that the system is effective for 
reconstructing simple communication signals. 

Our theoretical work supports these empirical conclusions, 
but it results in slightly weaker bounds on the sampling 
rate. We have been able to prove that a sampling rate of 
0(A'logM^ + log'^ W) suffices for high-probability recovery 
of the random signals we studied experimentally. This analysis 
also suggests that there is a small startup cost when the number 
of tones is small, but we did not observe this phenomenon in 
our experiments. It remains an open problem to explain the 
computational results in complete detail. 

The random signal model arises naturally in numerical 
experiments, but it does not provide an adequate description 
of real signals, whose frequencies and phases are typically far 
from random. To address this concern, we have established 
that the random demodulator can acquire all if -tone signals — 
regardless of the frequencies, amplitudes, and phases — when 
the sampling rate is 0{K log® W). In fact, the system does not 
even require the spectrum of the input signal to be sparse; the 
system can successfully recover any signal whose spectrum 
is well-approximated by K tones. Moreover, our analysis 
shows that the random demodulator is robust against noise 
and quantization errors. 

This work focuses on input signals drawn from a specific 
mathematical model, framed in Section [ll] Many real signals 
have sparse spectral occupancy, even though they do not meet 
all of our formal assumptions. We propose a device, based on 
the classical idea of windowing, that allows us to approximate 
general signals by signals drawn from our model. Therefore, 
our recovery results for the idealized signal class extend to 
signals that we are likely to encounter in practice. 

In summary, we believe that these empirical and theoretical 
results, taken together, provide compelling evidence that the 
demodulator system is a powerful alternative to Nyquist-rate 
sampling for sparse signals. 



C. Outline 

In Section |ll] we present a mathematical model for the 
class of sparse, bandlimited signals. Section [III] describes the 
intuition and architecture of the random demodulator, and it 
addresses the nonidealities that may affect its performance. In 
Section |IV] we model the action of the random demodulator 
as a matrix. Section [V] describes computational algorithms 
for reconstructing frequency-sparse signals from the coded 
samples provided by the demodulator We continue with an 



empirical study of the system in Section VI and we offer 



some theoretical results in Section VII that partially explain 



the system's performance. Section VIII discusses a windowing 
technique that allows the demodulator to capture nonperi- 
odic signals. We conclude with a discussion of potential 



technological impact and related work in Sections IX and 



Section [X| Appendices [I] [II] and III contain proofs of our signal 
reconstruction theorems. 
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II. The Signal Model 



B. Information Content of Signals 



Our analysis focuses on a class of discrete, multitone signals 
that have three distinguished properties: 

• Bandlimited. The maximum frequency is bounded. 

• Periodic. Each tone has an integral frequency in Hz. 

• Sparse. The number of active tones is small in compar- 
ison with the bandlimit. 

Our work shows that the random demodulator can recover 
these signals very efficiently. Indeed, the number of samples 
per unit time scales directly with the sparsity, but it increases 
only logarithmically in the bandlimit. 

At first, these discrete multitone signals may appear simpler 
than the signals that arise in most applications. For example, 
we often encounter signals that contain nonharmonic tones 
or signals that contain continuous bands of active frequencies 
rather than discrete tones. Nevertheless, these broader signal 
classes can be approximated within our model by means of 



windowing techniques. We address this point in Section VIII 



A. Mathematical Model 

Consider the following mathematical model for a class of 
discrete multitone signals. Let W/2 be a positive integer that 
exceeds the highest frequency present in the continuous-time 
signal /. Fix a number K that represents the number of active 
tones. The model contains each signal of the form 



Here, is a set of K integer-valued frequencies that satisfies 
f^c{0,±l,±2,...,±(T4^/2-l),iy/2}, 



and 



is a set of complex-valued amplitudes. We focus on the case 
where the number K of active tones is much smaller than the 
bandwidth W. 

To summarize, the signals of interest are bandlimited be- 
cause they contain no frequencies above W/2 cycles per 
second; periodic because the frequencies are integral; and 
sparse because the number of tones K <C W . 

Let us emphasize several conceptual points about this 
model: 

• We have normalized the time interval to one second for 
simplicity. Of course, it is possible to consider signals at 
another time resolution. 

• We have also normalized frequencies. To consider signals 
whose frequencies are drawn from a set equally spaced 
by A, we would change the effective bandlimit to W/ A. 

• The model also applies to signals that are sparse and 
bandlimited in a single time interval. It can be extended 
to signals where the model (|2]) holds with a different 
set of frequencies and amplitudes in each time interval 
[0, 1), [1, 2), [2, 3), . . . . These signals are sparse not in 
the Fourier domain but rather in the short-time Fourier 
domain [5, Ch. IV]. 



According to the sampling theorem, we can identify signals 
from the model (|2]i by sampling for one second at W Hz. 
Yet these signals contain only R = 0{K log{W/ K)) bits of 
information. In consequence, it is reasonable to expect that we 
can acquire these signals using only R digital samples. 

Here is one way to establish the information bound. 
Stirling's approximation shows that there are about 
exp{K log{W / K) + 0{K)} ways to select K distinct 
integers in the range {1, 2, . . . , W^}. Therefore, it takes 
0{K\og{W/K)) bits to encode the frequencies present in the 
signal. Each of the K amplitudes can be approximated with 
a fixed number of bits, so the cost of storing the frequencies 
dominates. 



C. Examples 

There are many situations in signal processing where we 
encounter signals that are sparse or locally sparse in frequency. 
Here are some basic examples: 

• Communications signals, such as transmissions with a 
frequency hopping modulation scheme that switches a 
sinusoidal carrier among many frequency channels ac- 
cording to a predefined (often pseudorandom) sequence. 
Other examples include transmissions with narrowband 
modulation where the carrier frequency is unknown but 
could lie anywhere in a wide bandwidth. 

• Acoustic signals, such as musical signals where each 
note consists of a dominant sinusoid with a progression 
of several harmonic overtones. 

• Slowly varying chirps, as used in radar and geophysics, 
that slowly increase or decrease the frequency of a 
sinusoid over time. 

• Smooth signals that require only a few Fourier coeffi- 
cients to represent. 

• Piecewise smooth signals that are differentiable except 
for a small number of step discontinuities. 

We also note several concrete applications where sparse 
wideband signals are manifest. Surveillance systems may 
acquire a broad swath of Fourier bandwidth that contains 
only a few communications signals. Similarly, cognitive radio 
applications rely on the fact that parts of the spectrum are 
not occupied [6], so the random demodulator could be used to 
perform spectrum sensing in certain settings. Additional poten- 
tial applications include geophysical imaging, where two- and 
three-dimensional seismic data can be modeled as piecewise 
smooth (hence, sparse in a local Fourier representation) [7], 
as well as radar and sonar imaging [8], [9]. 

III. The Random Demodulator 

This section describes the random demodulator system 
that we propose for signal acquisition. We first discuss the 
intuition behind the system and its design. Then we address 
some implementation issues and nonidealities that impact its 
performance. 
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A. Intuition 

The random demodulator performs three basic actions: 
demodulation, lowpass filtering, and low-rate sampling. Refer 
back to Figure [T] for the block diagram. In this section, we 
offer a short explanation of why this approach allows us to 
acquire sparse signals. 

Consider the problem of acquiring a single high-frequency 
tone that lies within a wide spectral band. Evidently, a low- 
rate sampler with an antialiasing filter is oblivious to any 
tone whose frequency exceeds the passband of the filter. The 
random demodulator deals with the problem by smearing the 
tone across the entire spectrum so that it leaves a signature 
that can be detected by a low-rate sampler. 

More precisely, the random demodulator forms a (periodic) 
square wave that randomly alternates at or above the Nyquist 
rate. This random signal is a sort of periodic approximation 
to white noise. When we multiply a pure tone by this random 
square wave, we simply translate the spectrum of the noise, as 
documented in Figure |2] The key point is that translates of the 
noise spectrum look completely different from each other, even 
when restricted to a narrow frequency band, which Figure [3] 
illustrates. 

Now consider what happens when we multiply a frequency- 
sparse signal by the random square wave. In the frequency 
domain, we obtain a superposition of translates of the noise 
spectrum, one translate for each tone. Since the translates are 
so distinct, each tone has its own signature. The original signal 
contains few tones, so we can disentangle them by examining 
a small slice of the spectrum of the demodulated signal. 

To that end, we perform lowpass filtering to prevent aliasing, 
and we sample with a low-rate ADC. This process results in 
coded samples that contain a complete representation of the 
original sparse signal. We discuss methods for decoding the 
samples in Section [V] 

B. System Design 

Let us present a more formal description of the random 
demodulator shown in Figure [T] The first two components 
implement the demodulation process. The first piece is a 
random number generator, which produces a discrete-time 
sequence eo,ei,£2--- of numbers that take values ±1 with 
equal probability. We refer to this as the chipping sequence. 
The chipping sequence is used to create a continuous-time 
demodulation signal Pc{t) via the formula 



Pc{t) 



t e 



n 71+1 
W 



and 71 = 0, 1, 



,W -I. 



In words, the demodulation signal switches between the levels 
±1 randomly at the Nyquist rate of W Hz. Next, the mixer 
multiplies the continuous-time input f{t) by the demodulation 
signal Pc{t) to obtain a continuous-time demodulated signal 

v{t) = f{t)-pS). te[o,i). 

Together these two steps smear the frequency spectrum of the 
original signal via the convolution 

rH = (F*p,)(w). 



See Figure |2] for a visual. 

The next two components behave the same way as a 
standard ADC, which performs lowpass filtering to prevent 
aliasing and then samples the signal. Here, the lowpass filter 
is simply an accumulator that sums the demodulated signal 
y{t) for \/R seconds. The filtered signal is sampled instan- 
taneously every \/R seconds to obtain a sequence {y„i] of 
measurements. After each sample is taken, the accumulator is 
reset. In summary, 

|.(m+l)/fl 

Vm^ R y{t) dt, 771 = 0, 1, . . . , i? - 1. 

Jm/R. 

This approach is called integrate-and-dump sampling. Finally, 
the samples are quantized to a finite precision. (In this work, 
we do not model the final quantization step.) 

The fundamental point here is that the sampling rate R is 
much lower than the Nyquist rate W . We will see that R 
depends primarily on the number K of significant frequencies 
that participate in the signal. 

C. Implementation and Nonidealities 

Any reasonable system for acquiring continuous-time sig- 
nals must be implementable in analog hardware. The sys- 
tem that we propose is built from robust, readily available 
components. This subsection briefly discusses some of the 
engineering issues. 

In practice, we generate the chipping sequence with a 
pseudorandom number generator. It is preferable to use pseu- 
dorandom numbers for several reasons: they are easier to 
generate; they are easier to store; and their structure can be 
exploited by digital algorithms. Many types of pseudorandom 
generators can be fashioned from basic hardware components. 
For example, the Mersenne twister [10] can be implemented 
with shift registers. In some applications, it may suffice just 
to fix a chipping sequence in advance. 

The performance of the random demodulator is unlikely 
to suffer from the fact that the chipping sequence is not 
completely random. We have been able to prove that if the 
chipping sequence consists of ^-wise independent random 
variables (for an appropriate value of £), then the demodulator 
still offers the same guarantees. Alon et al. have demonstrated 
that shift registers can generate a related class of random 
variables [11]. 

The mixer must operate at the Nyquist rate W. Nevertheless, 
the chipping sequence alternates between the levels ±1, so the 
mixer only needs to reverse the polarity of the signal. It is 
relatively easy to perform this step using inverters and mul- 
tiplexers. Most conventional mixers trade speed for linearity, 
i.e., fast transitions may result in incorrect products. Since 
the random demodulator only needs to reverse polarity of the 
signal, nonlinearity is not the primary nonideality. Instead, the 
bottleneck for the speed of the mixer is the settling times of 
inverters and multiplexors, which determine the length of time 
it takes for the output of the mixer to reach steady state. 

The sampler can be implemented with an off-the-shelf 
ADC. It suffers the same types of nonidealities as any ADC, 
including thermal noise, aperture jitter, comparator ambiguity. 
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and so forth [12]. Since the random demodulator operates at a 
relatively low sampling rate, we can use high-quality ADCs, 
which exhibit fewer problems. 

In practice, a high-fidelity integrator is not required. It 
suffices to perform lowpass filtering before the samples are 
taken. It is essential, however, that the impulse response of 
this filter can be characterized very accurately. 

The net effect of these nonideahties is much like the 
addition of noise to the signal. The signal reconstruction 
process is very robust, so it performs well even in the presence 
of noise. Nevertheless, we must emphasize that, as with any 
device that employs mixed signal technologies, an end-to-end 
random demodulator system must be calibrated so that the 
digital algorithms are aware of the nonidealities in the output 
of the analog hardware. 

IV. Random Demodulation in Matrix Form 

In the ideal case, the random demodulator is a linear system 
that maps a continuous-time signal to a discrete sequence of 
samples. To understand its performance, we prefer to express 
the system in matrix form. We can then study its properties 
using tools from matrix analysis and functional analysis. 

A. Discrete-Time Representation of Signals 

The first step is to find an appropriate discrete representation 
for the space of continuous-time input signals. To that end, 
note that each (l/W^)-second block of the signal is multiplied 
by a random sign. Then these blocks are aggregated, summed, 
and sampled. Therefore, part of the time-averaging performed 
by the accumulator commutes with the demodulation process. 
In other words, we can average the input signal over blocks 
of duration 1/W without affecting subsequent steps. 

Fix a time instant of the form f„ = n/W for an integer n. 
Let Xn denote the average value of the signal / over a time 
interval of length 1/W starting at t„. Thus, 

/t„+i/w 
f{t) dt 

-27riw/W _ 



wen 



-27ria't„ 



(3) 



frequency = 0, the 
|a;| < W/2, the bracket 



with the convention that, for the 
bracketed term equals l/W. Since 
never equals zero. Absorbing the brackets into the amplitude 
coefficients, we obtain a discrete-time representation a;„ of the 
signal fit): 

= 



Sijj 6 



forn = 0,l,...,VF-l 



where 



-27riu;/VK 



In particular, a continuous-time signal that involves only the 
frequencies in Q, can be viewed as a discrete-time signal 
comprised of the same frequencies. We refer to the complex 
vector s as an amplitude vector, with the understanding that 
it contains phase information as well. 



The nonzero components of the length- vector s are listed 
in the set fl. We may now express the discrete- time signal x 
as a matrix-vector product. Define the W x matrix 



1 



2■^inu/W^ 



where 



n = 0,l,...,W^-l and 
a; = 0,±l,...,± 



W 



w 

~2' 



The matrix F is a simply a permuted discrete Fourier trans- 
form (DFT) matrix. In particular, F is unitary and its entries 
share the magnitude W~^^'^. 

In summary, we can work with a discrete representation 



X = Fs 



of the input signal. 



B. Action of the Demodulator 

We view the random demodulator as a hnear system acting 
on the discrete form x of the continuous-time signal /. 

First, we consider the effect of random demodulation on 
the discrete-time signal. Let eo,e\,. . . , sw-i be the chipping 
sequence. The demodulation step multiplies each a;„, which is 
the average of / on the nth time interval, by the random sign 
e„. Therefore, demodulation corresponds to the map x i— > Dx 
where 

£1 



D 



ew-i 



is a W X W diagonal matrix. 

Next, we consider the action of the accumulate-and-dump 
sampler. Suppose that the sampling rate is R, and assume 
that R divides W. Then each sample is the sum of W/R 
consecutive entries of the demodulated signal. Therefore, the 
action of the sampler can be treated as an Rx W matrix H 
whose rth row has W/R consecutive unit entries starting in 
column rW/R+ 1 for each r = 0, 1, . . . , i? — 1. An example 
with i? = 3 and W = 12 is 



H 



1111 



1111 



1111 



When R does not divide W, two samples may share 
contributions from a single element of the chipping sequence. 
We choose to address this situation by allowing the matrix 
H to have fractional elements in some of its colunms. An 
example with R = 3 and W = 7 is 



H 



1 1 




1 1 



We have found that this device provides an adequate approxi- 
mation to the true action of the system. For some applications, 
it may be necessary to exercise more care. 
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In summary, the matrix M = HD describes the action of 
the hardware system on the discrete signal x. Each row of the 
matrix yields a separate sample of the input signal. 

The matrix $ — MF describes the overall action of the 
system on the vector s of amplitudes. This matrix €> has a 
special place in our analysis, and we refer to it as a random 
demodulator matrix. 

C. Prewhitening 

It is important to note that the bracket in (|3]l leads to 
a nonlinear attenuation of the amplitude coefficients. In a 
hardware implementation of the random demodulator, it may 
be advisable to apply a prewhitening filter to preserve the 
magnitudes of the amplitude coefficients. On the other hand, 
prewhitening amplifies noise in the high frequencies. We leave 
this issue for future work. 

V. Signal Recovery Algorithms 

The Shannon sampling theorem provides a simple linear 
method for reconstructing a bandlimited signal from its time 
samples. In contrast, there is no linear process for reconstruct- 
ing the input signal from the output of the random demodulator 
because we must incorporate the highly nonlinear sparsity 
constraint into the reconstruction process. 

Suppose that s is a sparse amplitude vector, and let y = €>s 
be the vector of samples acquired by the random demodulator 
Conceptually, the way to recover s is to solve the mathematical 
program 



s = argmin \\v\\q subject to = y 



(4) 



where the io function ||-||q counts the number of nonzero 
entries in a vector. In words, we seek the sparsest amplitude 
vector that generates the samples we have observed. The 
presence of the £o function gives the problem a combinatorial 
character, which may make it computationally difficult to solve 
completely. 

Instead, we resort to one of the signal recovery algorithms 
from the sparse approximation or compressive sampling lit- 
erature. These techniques fall in two rough classes: convex 
relaxation and greedy pursuit. We describe the advantages and 
disadvantages of each approach in the sequel. 

This work concentrates on convex relaxation methods be- 
cause they are more amenable to theoretical analysis. Our 
purpose here is not to advocate a specific algorithm but rather 
to argue that the random demodulator has genuine potential 
as a method for acquiring signals that are spectrally sparse. 
Additional research on algorithms will be necessary to make 



the technology viable. See Section IX-C for discussion of how 



quickly we can perform signal recovery with contemporary 
computing architectures. 

A. Convex Relaxation 

The problem Q is difficult because of the unfavorable 
properties of the £o function. A fundamental method for 
dealing with this challenge is to relax the Iq function to the £i 
norm, which may be viewed as the convex function "closest" 



to £q. Since the £i norm is convex, it can be minimized subject 
to convex constraints in polynomial time [13]. 

Let s be the unknown amplitude vector, and let y = $s be 
the vector of samples acquired by the random demodulator 
We attempt to identify the amplitude vector s by solving the 
convex optimization problem 



s = arg mm \ \v\\^ 



subject to = y. 



(5) 



In words, we search for an amplitude vector that yields the 
same samples and has the least £i norm. On account of the 
geometry of the £i ball, this method promotes sparsity in the 
estimate s. 

The problem (|5]l can be recast as a second-order cone 
program. In our work, we use an old method, iteratively 
reweighted least squares (IRLS), for performing the optimiza- 
tion [14, 173ff]. It is known that IRLS converges Unearly for 
certain signal recovery problems [15]. It is also possible to 
use interior-point methods, as proposed by Candes et al. [16]. 

Convex programming methods for sparse signal recovery 
problems are very powerful. Second-order methods, in partic- 
ular, seem capable of achieving very good reconstructions of 
signals with wide dynamic range. Recent work suggests that 
optimal first-order methods provide similar performance with 
lower computational overhead [17]. 

B. Greedy Pursuits 

To produce sparse approximate solutions to linear systems, 
we can also use another class of methods based on greedy 
pursuit. Roughly, these algorithms build up a sparse solution 
one step at a time by adding new components that yield the 
greatest immediate improvement in the approximation error 
An early paper of Gilbert and Tropp analyzed the performance 
of an algorithm called Orthogonal Matching Pursuit for simple 
compressive sampling problems [18]. More recently, work 
of Needell and Vershynin [19], [20] and work of Needell 
and Tropp [21] has resulted in greedy-type algorithms whose 
theoretical performance guarantees are analogous with those 
of convex relaxation methods. 

In practice, greedy pursuits tend to be effective for problems 
where the solution is ultra-sparse. In other situations, convex 
relaxation is usually more powerful. On the other hand, greedy 
techniques have a favorable computational profile, which 
makes them attractive for large-scale problems. 

C. Impact of Noise 

In applications, signals are more likely to be compressible 
than to be sparse. (Compressible signals are not sparse but 
can be approximated by sparse signals.) Nonidealities in the 
hardware system lead to noise in the measurements. Fur- 
thermore, the samples acquired by the random demodulator 
are quantized. Modern convex relaxation methods and greedy 
pursuit methods are robust against all these departures from 
the ideal model. We discuss the impact of noise on signal 



reconstruction algorithms in Section VII-B 



In fact, the major issue with all compressive sampling 
methods is not the presence of noise per se. Rather, the process 
of compressing the signal's information into a small number 
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of samples inevitably decreases the signal-to-noise ratio. In 
the design of compressive sampling systems, it is essential to 
address this issue. 

VI. Empirical Results 

We continue with an empirical study of the minimum 
measurement rate required to accurately reconstruct signals 
with the random demodulator. Our results are phrased in terms 
of the Nyquist rate W, the sampling rate R, and the sparsity 
level K. It is also common to combine these parameters into 
scale-free quantities. The compression factor R/W measures 
the improvement in the sampling rate over the Nyquist rate, 
and the sampling efficiency K/R measures the number of 
tones acquired per sample. Both these numbers range between 
zero and one. 

Our results lead us to an empirical rule for the sampling 
rate necessary to recover random sparse signals using the 
demodulator system: 

RKl.7K\og{WlK +1). (6) 

The empirical rate is similar to the weak phase transition 
threshold that Donoho and Tanner calculated for compressive 
sensing problems with a Gaussian sampling matrix [22]. The 
form of this relation also echoes the form of the information 
bound developed in Section [Il-B| 

A. Random Signal Model 

We begin with a stochastic model for frequency-sparse 
discrete signals. To that end, define the signum function 

sgn {r& ) = \ 

The model is described in the following box: 



Model (A) for a 


random amplitude vector s 


Frequencies: 




is a uniformly random set 
of K frequencies from 
{0,±l,...,±(W^/2-l),l¥/2} 


Amplitudes: 




is arbitrary for each G il 
for each w ^ 


Phases: 


sgn(sc^) 


is i.i.d. uniform on the unit cir- 
cle for each w G 



In our experiments, we set the amplitude of each nonzero 
coefficient equal to one because the success of l\ minimization 
does not depend on the amplitudes. 

B. Experimental Design 

Our experiments are intended to determine the minimum 
sampling rate R that is necessary to identify a ii'-sparse signal 
with a bandwidth of W Hz. In each trial, we perform the 
following steps: 



60 




Signal Bandwidth Hz (W) 



Fig. 4. Sampling rate as a function of signal bandwidtlj. The sparsity is fixed 
at K = 5. The solid discs mark the lowest sampling rate R that achieves 
successful reconstruction with probability 0.99. The solid line denotes the 
linear least-squares fit _R = IWK log(W/ K + 1) + 4.51 for the data. 

1) Input Signal. A new amplitude vector s is drawn at 
random according to Model (A) in Section |VI-A| The 
amplitudes all have magnitude one. 

2) Random demodulator. A new random demodulator * 
is drawn with parameters K, R, and W. The elements of 
the chipping sequence are independent random variables, 
equally likely to be ±1. 

3) Sampling. The vector of samples is computed using the 
expression y = €>s. 

4) Reconstruction. An estimate s of the amplitude vector 
is computed with IRLS. 

For each triple {K, R, W), we perform 500 trials. We 
declare the experiment a success when s s to machine 
precision, and we report the smallest value of R for which the 
empirical failure probability is less than 1%. 

C. Performance Results 

We begin by evaluating the relationship between the signal 
bandwidth W and the sampling rate R required to achieve a 
high probability of successful reconstruction. Figure |4] shows 
the experimental results for a fixed sparsity of X = 5 as 
W increases from 128 to 2048 Hz, denoted by solid discs. 
The solid line shows the result of a linear regression on the 
experimental data, namely 

R^ imKlog{W/K + I) + 

The variation about the regression line is probably due to 
arithmetic effects that occur when the sampling rate does not 
divide the bandlimit. 



Next, we evaluate the relationship between the sparsity K 
and the sampling rate R. Figure |5] shows the experimental 
results for a fixed chipping rate of = 512 Hz as the sparsity 
K increases from 1 to 64. The figure also shows that the linear 
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Fig. 5. Samping rate as a function of sparsity. The bandlimit is fixed at 
W = 512 Hz. The solid discs mark the lowest sampling rate that achieves 
successful reconstruction with probability 0.99. The solid line denotes the 
linear least-squares fit = l.7lK\og{W/K + 1) + 1.00 for the data. 

regression give a close fit for the data. 

i?= l.7lK\og{W I K + I) + im 
is the empirical trend. 

The regression lines from these two experiments suggest 
that successful reconstruction of signals from Model (A) 
occurs with high probability when the sampling rate obeys 
the bound 

R>\lK\Qg{W/K +\). (7) 

Thus, for a fixed sparsity, the sampling rate grows only 
logarithmically as the Nyquist rate increases. We also note that 
this bound is similar to those obtained for other measurement 
schemes that require fully random, dense matrices [23]. 

Finally, we study the threshold that denotes a change from 
high to low probability of successful reconstruction. This type 
of phase transition is a common phenomenon in compressive 
sampling. For this experiment, the chipping rate is fixed at 
W = 512 Hz, while the sparsity K and sampling R rate vary. 
We record the probability of success as the compression factor 
R/W and the sampling efficiency K/R vary. 

The experimental results appear in Figure |6] Each pixel in 
this image represents the probability of success for the cor- 
responding combination of system parameters. Lighter pixels 
denote higher probability. The dashed line, 

K _ 0.68 

~R ~ \og{W/K + 1) ' 

describes the relationship among the parameters where the 
probability of success drops below 99%. The numerical pa- 
rameter was derived from a linear regression without intercept. 

For reference, we compare the random demodulator with a 
benchmark system that obtains measurements of the amplitude 
vector s by applying a matrix €> whose entries are drawn 
independently from the standard Gaussian distribution. As 
the dimensions of the matrix grow, li minimization methods 
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Fig. 6. Probability of success as a function of sampling efficiency and 
compression factor. The shade of each pixel indicates the probability of 
successful reconstruction for the corresponding combination of parameter 
values. The solid line marks the theoretical threshold for a Gaussian matrix; 
the dashed line traces the 99% success isocline. 

exhibit a sharp phase transition from success to failure. The 
solid line in Figure|6]marks the location of the precipice, which 
can be computed analytically with methods from [22]. 



D. Example: Analog Demodulation 

We constructed these empirical performance trends using 
signals drawn from a synthetic model. To narrow the gap 
between theory and practice, we performed another experiment 
to demonstrate that the random demodulator can successfully 
recover a simple communication signal from samples taken 
below the Nyquist rate. 

In the amplitude modulation (AM) encoding scheme, the 
transmitted signal /am(^) takes the form 

/am (t) = A cos(27rw,t) • (m(t) + C) , (8) 

where ■m{t) is the original message signal, ujc is the carrier 
frequency, and A, C are fixed values. When the original 
message signal m{t) has K nonzero Fourier coefficients, then 
the cosine-modulated signal has only 2K + 2 nonzero Fourier 
coefficients. 

We consider an AM signal that encodes the message ap- 
pearing in Figure |7]^a). The signal was transmitted from a 
communications device using carrier frequency uJc = 8.2 KHz, 
and the received signal was sampled by an ADC at a rate of 
32 KHz. Both the transmitter and receiver were isolated in a 
lab to produce a clean signal; however, noise is still present on 
the sampled data due to hardware effects and environmental 
conditions. 

We feed the received signal into a simulated random de- 
modulator, where we take the Nyquist rate W = 32 KHz and 
we attempt a variety of sampling rates R. We use IRLS to 
reconstruct the signal from the random samples, and we per- 
form AM demodulation on the recovered signal to reconstruct 
the original message. Figures |7jb)-(d) display reconstructions 
for a random demodulator with sampling rates i? = 16 KHz, 
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8 KHz, and 3.2 KHz, respectively. To evaluate the quality 
of the reconstruction, we measure the signal-to-noise ratio 
(SNR) between the message obtained from the received signal 
/am and the message obtained from the output of the random 
demodulator The reconstructions achieve SNRs of 27.8 dB, 
22.3 dB, and 20.9 dB, respectively. 

These results demonstrate that the performance of the ran- 
dom demodulator degrades gracefully as the SNR decreases. 
Note, however, that the system will not function at all unless 
the sampling rate is sufficiently high that we stand below the 
phase transition. 

VII. Theoretical Recovery Results 

We have been able to establish several theoretical guaran- 
tees on the performance of the random demodulator system. 
First, we focus on the setting described in the numerical 
experiments, and we develop estimates for the sampling rate 
required to recover the synthetic sparse signals featured in 



Section VI-C Qualitatively, these bounds almost match the 



system performance we witnessed in our experiments. The 
second set of results addresses the behavior of the system for a 
much wider class of input signals. This theory establishes that, 
when the sampling rate is slightly higher, the signal recovery 
process will succeed — even if the spectrum is not perfectly 
sparse and the samples collected by the random demodulator 
are contaminated with noise. 

A. Recovery of Random Signals 

First, we study when £i minimization can reconstruct ran- 
dom, frequency-sparse signals that have been sampled with 
the random demodulator. The setting of the following theorem 
precisely matches our numerical experiments. 

Theorem 1 (Recovery of Random Signals): Suppose that 
the sampling rate 

R>G [A'logW^ + log^ W] 

and that R divides W . The number C is a positive, universal 
constant. 

Let s be a random amplitude vector drawn according to 
Model (A). Draw an i? x random demodulator matrix €>, 
and let y = 4>s be the samples collected by the random 
demodulator 

Then the solution s to the convex program (|5]l equals s, 
except with probability 0{W^^). 

We have framed the unimportant technical assumption that 
R divides W to simplify the lengthy argument. See Ap- 
pendix [n] for the proof. 

Our analysis demonstrates that the sampling rate R scales 
linearly with the sparsity level K, while it is logarithmic in 
the bandlimit W. In other words, the theorem supports our 
empirical rule ^ for the sampling rate. Unfortunately, the 
analysis does not lead to reasonable estimates for the leading 
constant. Still, we believe that this result offers an attractive 
theoretical justification for our claims about the sampling 
efficiency of the random demodulator 

The theorem also suggests that there is a small startup cost. 
That is, a minimal number of measurements is required before 



the demodulator system is effective. Our experiments were 
not refined enough to detect whether this startup cost actually 
exists in practice. 

B. Uniformity and Stability 

Although the results described in the last section provide 
a satisfactory explanation of the numerical experiments, they 
are less compelling as a vision of signal recovery in the 
real world. Theorem [T] has three major shortcomings. First, 
it is unreasonable to assume that tones are located at random 
positions in the spectrum, so Model (A) is somewhat artificial. 
Second, typical signals are not spectrally sparse because 
they contain background noise and, perhaps, irrelevant low- 
power frequencies. Finally, the hardware system suffers from 
nonidealities, so it only approximates the linear transformation 
described by the matrix M. As a consequence, the samples 
are contaminated with noise. 

To address these problems, we need an algorithm that 
provides uniform and stable signal recovery. Uniformity means 
that the approach works for all signals, irrespective of the 
frequencies and phases that participate. Stability means that the 
performance degrades gracefully when the signal's spectrum 
is not perfectly sparse and the samples are noisy. Fortunately, 
the compressive sampling community has developed several 
powerful signal recovery techniques that enjoy both these 
properties. 

Conceptually, the simplest approach is to modify the convex 
program (|5| to account for the contamination in the sam- 
ples [24]. Suppose that s e is an arbitrary amplitude 
vector, and let u £ be an arbitrary noise vector that is 
known to satisfy the bound ||i>'||2 < V- Assume that we have 
acquired the duty samples 



y 



To produce an approximation of s, it is natural to solve the 
noise-aware optimization problem 



s — arg mm \ \v\\^ 



subject to W^v — yW^ < rj. (9) 



As before, this problem can be formulated as a second- 
order cone program, which can be optimized using various 
algorithms [24]. 

We have established a theorem that describes how the 
optimization-based recovery technique performs when the 
samples are acquired using the random demodulator system. 
An analogous result holds for the CoSaMP algorithm, a 
greedy pursuit method with superb guarantees on its run- 
time [21]. 

Theorem 2 (Recovery of General Signals): Suppose that 
the sampling rate 

R > Cinog^T4^ 

and that R divides W. Draw an Rx W random demodulator 
matrix Then the following statement holds, except with 
probability 0{W-^). 

Suppose that s is an arbitrary amplitude vector and v> is a 
noise vector with \\i'\\2 < V- Let y ^ + v> he the noisy 
samples collected by the random demodulator. 
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Fig. 7. Simulated acquisition and reconstruction of an AM signal, (a) Message reconstructed from signal sampled at Nyquist rate W = 32 KHz. (b) Message 
reconstructed from the output of a simulated random demodulator running at sampling rate R = 16 KHz (SNR = 27.8 dB). (c) Reconstruction with R = 8 
KHz (SNR = 22.3 dB). (d) Reconstruction with R = 3.2 KHz (SNR = 20.9 dB). 



Then every solution s to the convex program (|9]l approxi- 
mates the target vector s: 



s|l2 ^ C max <^ rj, 



\s - Sk\ 



(10) 



where sk is a best ivT-sparse approximation to s with respect 
to the £i norm. 

The proof relies on the restricted isometry property (RIP) 
of Candes-Tao [25]. To establish that the random demodulator 
verifies the RIP, we adapt ideas of Rudelson-Vershynin [26]. 
Turn to Appendix [III] for the argument. 

Theorem|2]is not as easy to grasp as Theorem[T[ so we must 
spend some time to unpack its meaning. First, observe that 
the sampling rate has increased by several logarithmic factors. 
Some of these factors are probably parasitic, a consequence of 
the techniques used to prove the theorem. It seems plausible 
that, in practice, the actual requirement on the sampling rate 
is closer to 

R>GK\og^W. 
This conjecture is beyond the power of current techniques. 



The earlier result. Theorem [T| suggests that we should draw 
a new random demodulator each time we want to acquire a sig- 
nal. In contrast. Theorem |2] shows that, with high probability, a 
particular instance of the random demodulator acquires enough 
information to reconstruct any amplitude vector whatsoever 
This aspect of Theorem |2] has the practical consequence that 
a random chipping sequence can be chosen in advance and 
fixed for aU time. 

Theorem |2] does not place any specific requirements on the 
amplitude vector, nor does it model the noise contaminating 
the signal. Indeed, the approximation error bound ( [TOj i holds 
generally. That said, the strength of the error bound depends 
substantially on the structure of the amplitude vector. When s 
happens to be i^-sparse, then the second term in the maximum 
vanishes. So the convex programming method still has the 
ability to recover sparse signals perfectly. 

When s is not sparse, we may interpret the bound ( fTO] ) 
as saying that the computed approximation is comparable 
with the best X-sparse approximation of the amplitude vector 
When the amplitude vector has a good sparse approximation, 
then the recovery succeeds well. When the amplitude vector 
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does not have a good approximation, then signal recovery 
may fail completely. For this class of signal, the random 
demodulator is not an appropriate technology. 

Initially, it may seem difficult to reconcile the two norms 
that appear in the error bound ( [TO] i. In fact, this type of mixed- 
norm bound is structurally optimal [27], so we cannot hope 
to improve the £i norm to an £2 norm. Nevertheless, for an 
important class of signals, the scaled £1 norm of the tail is 
essentially equivalent to the £2 norm of the tail. 

Let p £ (0, 1). We say that a vector s is p-compressible 
when its sorted components decay sufficiently fast: 

<k-^'P for fc= 1,2,3,.... 



(11) 



Harmonic analysis shows that many natural signal classes 
are compressible [28]. Moreover, the windowing scheme of 



Section VIII results in compressible signals. 



The critical fact is that compressible signals are well ap- 
proximated by sparse signals. Indeed, it is straightforward to 
check that 

1 



s - Sk\ 



\s - Sk\ 



< 



< 



1/p-l 

1 



ifl/2-l/p^ 



Note that the constants on the right-hand side depend only on 
the level p of compressibility. 

For a p-compressible signal, the error bound ( [TO) i reads 

||s-s||2 <Cmax|77,i^i/2-i/pJ_ 

We see that the right-hand side is comparable with the £2 norm 
of the tail of the signal. This quantitive conclusion reinforces 
the intuition that the random demodulator is efficient when the 
amplitude vector is well approximated by a sparse vector. 

C. Extension to Bounded Orthobases 

We have shown that the random demodulator is effective 
at acquiring signals that are spectrally sparse or compressible. 
In fact, the system can acquire a much more general class of 
signals. The proofs of Theorem [T] and Theorem |2] indicate that 
the crucial feature of the sparsity basis F is its incoherence 
with the Dirac basis. In other words, we exploit the fact 
that the entries of the matrix F have small magnitude. This 
insight allows us to extend the recovery theory to other sparsity 
bases with the same property. We avoid a detailed exposition. 
See [29] for more discussion of this type of result. 

VIII. Windowing 

We have shown that the random demodulator can acquire 
periodic multitone signals. The effectiveness of the system for 
a general signal / depends on how closely we can approximate 
/ on the time interval [0, 1) by a periodic multitone signal. In 
this section, we argue that many common types of nonperiodic 
signals can be approximated well using windowing techniques. 
In particular, this approach allows us to capture nonharmonic 
sinusoids and multiband signals with small total bandwidth. 
We offer a brief overview of the ideas here, deferring a detailed 
analysis for a later publication. 



A. Nonharmonic Tones 

First, there is the question of capturing and reconstructing 
nonharmonic sinusoids, signals of the form 



fit) 



-2-K\uj' t/W 



z. 



It is notoriously hard to approximate / on [0,1) using 
harmonic sinusoids, because the coefficients in (|2]l are 
essentially samples of a sine function. Indeed, the signal Jk, 
defined as the best approximation of / using K harmonic 
frequencies, satisfies only 



11/ 



(12) 



/^f IIl2[0,1) 

a painfully slow rate of decay. 

There is a classical remedy to this problem. Instead of 
acquiring / directly, we acquire a smoothly windowed version 
of /. To fix ideas, suppose that ijj is a window function that 
vanishes outside the interval [0, 1). Instead of measuring /, 
we measure g — ip ■ f. The goal is now to reconstruct this 
windowed signal g. 

As before, our ability to reconstruct the windowed signal 
depends on how well we can approximate it using a periodic 
multitone signal. In fact, the approximation rate depends on the 
smoothness of the window. When the continuous-time Fourier 
transform -ip decays like oj^^ for some r > 1, we have that gx, 
the best approximation of g using K harmonic frequencies, 
satisfies 

which decays to zero much faster than the error bound ( [T2] l. 
Thus, to achieve a tolerance of e, we need only about 



-l/(,-l/2) 



terms. 



In summary, windowing the input signal allows us to closely 
approximate a nonharmonic sinusoid by a periodic multitone 
signal; g will be compressible as in ( [TT| i. If there are multiple 
nonharmonic sinusoids present, then the number of harmonic 
tones required to approximate the signal to a certain tolerance 
scales linearly with the number of nonharmonic sinusoids. 

B. Multiband Signals 

Windowing also enables us to treat signals that occupy a 
small band in frequency. Suppose that / is a signal whose 
continuous-time Fourier transform / vanishes outside an in- 
terval of length B. The Fourier transform of the windowed 
signal g — ip ■ f can be broken into two pieces. One part is 
nonzero only on the support of /; the other part decays like 
w"'" away from this interval. As a result, we can approximate 
g to a tolerance of e using a multitone signal with about 
B + terms. 

For a multiband signal, the same argument applies to 
each band. Therefore, the number of tones required for the 
approximation scales linearly with the total bandwidth. 

C. A Vista on Windows 

To reconstruct the original signal / over a long period of 
time, we must multiply the signal with overlapping shifts of 
a window ip. The window needs to have additional properties 
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for this scheme to work. Suppose that the set of half-integer 
shifts of 'ip form a partition of unity, i.e., 



— Random Demodulator back-end ADC 
— ADC state of the art 1999 



After measuring and reconstructing gk{t) = ip{t — k/2) ■ 
f{t) on each subinterval, we simply add the reconstructions 
together to obtain a complete reconstruction of /. 

This windowing strategy relies on our ability to measure the 
windowed signal t/j-f. To accomplish this, we require a slightly 
more complicated system architecture. To perform the win- 
dowing, we use an amplifier with a time-varying gain. Since 
subsequent windows overlap, we will also need to measure 
gk and gk+i simultaneously, which creates the need for two 
random demodulator channels running in parallel. In principle, 
none of these modifications diminishes the practicality of the 
system. 

IX. Discussion 

This section develops some additional themes that arise 
from our work. First, we discuss the SNR performance and 
SWAP profile of the random demodulator system. Afterward, 
we present a rough estimate of the time required for signal 
recovery using contemporary computing architectures. We 
conclude with some speculations about the technological im- 
pact of the device. 

A. SNR Performance 

A significant benefit of the random demodulator is that 
we can control the SNR performance of the reconstruction 
by optimizing the sampling rate of the back-end ADC, as 



demonstrated in Section VI-D When input signals are spec- 



trally sparse, the random demodulator system can outperform 
a standard ADC by a substantial margin. 

To quantify the SNR performance of an ADC, we consider 
a standard metric, the effective number of bits (ENOB), which 
is roughly the actual number of quantization levels possible at 
a given SNR. The ENOB is calculated as 



ENOB = (SNR - 1.76)/6.02, 



(13) 



where the SNR is expressed in dB. We estimate the ENOB 
of a modern ADC using the information in Walden's 1999 
survey [12]. 

When the input signal has sparsity K and bandlimit W, we 
can acquire the signal using a random demodulator running 
at rate R, which we estimate using the relation ([T]). As noted, 
this rate is typically much lower than the Nyquist rate. Since 
low-rate ADCs exhibit much better SNR performance than 
high-rate ADCs, the demodulator can produce a higher ENOB. 

Figure [8] charts how much the random demodulator im- 
proves on state-of-the-art ADCs when the input signals are 
assumed to be sparse. The first panel. Figure |8ja), compares 
the random demodulator with a standard ADC for signals with 
the sparsity K = 10^ as the Nyquist rate W varies up to 
lO^'^ Hz. In this setting, the clear advantages of the random 
demodulator can be seen in the slow decay of the curve. The 
second panel. Figure [8]^b), estimates the ENOB as a function 
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Fig. 8. ENOB for random demodulator versus a standard ADC. The solid 
lines represent state-of-the-art ADC performance in 1999. The dashed lines 
represent the random demodulator performance using an ADC running at 
the sampling rate R suggested by our empirical work, (a) Performance as a 
function of the bandlimit W with fixed sparsity K = 10^. (b) Peiformance 
as a function of the sparsity K with fixed bandlimit W = 10^. Note that the 
performance of a standard ADC does not depend on K. 



of the sparsity K when the bandlimit is fixed at = 10^ Hz. 
Again, a distinct advantage is visible. But we stress that this 
improvement is contingent on the sparsity of the signal. 



Although this discussion suggests that the SNR behavior of 
the random demodulator represents a clear improvement over 
traditional ADCs, we have neglected some important factors. 
The estimated performance of the demodulator does not take 
into account SNR degradations due to the chain of hardware 
components upstream of the sampler. These degradations are 
caused by factors such as the nonlinearity of the multiplier 
and jitter of the pseudorandom modulation signal. Thus, it 
is critical to choose high-quality components in the design. 
Even under this constraint, the random demodulator may have 
a more attractive feasibility and cost profile than a high-rate 
ADC. 
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B. Power Consumption 

In some applications, the power consumption of the signal 
acquisition system is of tantamount importance. One widely 
used figure of merit for ADCs is the quantity 
2ENOB J 

diss 

where fg is the sampling rate and Pdiss is the power dissipation 
[30]. We propose a slightly modified version of this figure of 
merit to compare compressive ADCs with conventional ones: 

2ENOB-l^ 
^diss(^) 

where we simply replace the sampling rate with the acquisition 
bandlimit W/2 and express the power dissipated as a function 
of the actual sampling rate. For the random demodulator, 

oENOB-lw 
FOM« — - 

Pdiss(1.7i^log(W^/X)) 

on account of Q. The random demodulator incurs a penalty 
in the effective number of bits for a given signal, but it may 
require significantly less power to acquire the signal. This 
effect becomes pronounced as the bandlimit W becomes large, 
which is precisely where low-power ADCs start to fade. 

C. Computational Resources Needed for Signal Recovery 

Recovering randomly demodulated signals in real-time 
seems like a daunting task. This section offers a back-of-the- 
envelope calculation that supports our claim that current digital 
computing technology is nearly adequate to achieve real-time 
recovery rates for signals with frequency components in the 
gigahertz range. 

The computational cost of a sparse recovery algorithm 
is dominated by repeated application of the system matrix 
4> and its transpose, in both the case where the algorithm 



solves a convex optimization problem (Section V-Ai or per- 
forms a greedy pursuit (Section V-B 1. Recently developed 



algorithms [31], [32], [33], [21], [17] typically produce good 
solutions with a few hundred applications of the system matrix. 

For the random demodulator, the matrix <1> is a composition 
of three operations: 

1) a length- discrete Fourier transform, which requires 
0(M^log W) multiplications and additions (via an FFT), 

2) a pointwise multiplication, which uses W multiplies, and 

3) a calculation of block sums, which involves W addi- 
tions. 

Of these three steps, the FFT is by far the most expensive. 
Roughly speaking, it takes several hundred FFTs to recover a 
sparse signal with Nyquist rate W from measurements made 
by the random demodulator. 

Let us fix some numbers so we can estimate the amount 
of computational time required. Suppose that we want the 
digital back-end to output 2'^*' (or, about 1 billion) samples 
per second. Assume that we compute the samples in blocks of 
size 2^** = 16, 384 and that we use 200 FFTs for each block. 
Since we need to recover 2^^ blocks per second, we have to 
perform about 13 milUon 16K-point FFTs in one second. 



This amount of computation is substantial, but it is not 
entirely unrealistic. For example, a recent benchmark [34] 
reports that the Cell processor performs a 16K-point FFT at 
37.6 Gflops/s, which translate^] to about 1.5 x 10"^ s. The 
nominal time to recover a single block is around 3 ms, so 
the total cost of processing 2^^ blocks is around 200 s. Of 
course, the blocks can be recovered in parallel, so this factor 
of 200 can be reduced significantly using parallel or multicore 
architectures. 

The random demodulator offers an additional benefit. The 
samples collected by the system automatically compress a 
sparse signal into a minimal amount of data. As a result, 
the storage and communication requirements of the signal 
acquisition system are also reduced. Furthermore, no extra 
processing is required after sampling to compress the data, 
which decreases the complexity of the required hardware and 
its power consumption. 

D. Technological Impact 

The easiest conclusion to draw from the bounds on SNR 
performance is that the random demodulator may allow us to 
acquire high-bandwidth signals that are not accessible with 
current technologies. These applications still require high- 
performance analog and digital signal processing technology, 
so they may be several years (or more) away. 

A more subtle conclusion is that the random demodulator 
enables us to perform certain signal processing tasks using 
devices with a more favorable size, weight, and power (SWAP) 
profile. We believe that these applications will be easier to 
achieve in the near future because suitable ADC and DSP 
technology is already available. 

X. Related Work 

Finally, we describe connections between the random de- 
modulator and other approaches to sampling signals that 
contain limited information. 

A. Origins of Random Demodulator 

The random demodulator was introduced in two earlier 
papers [35], [36], which offer preliminary work on the system 
architecture and experimental studies of the system's behavior 
The current paper can be seen as an expansion of these 
articles, because we offer detailed information on performance 
trends as a function of the signal bandlimit, the sparsity, and 
the number of samples. We have also developed theoretical 
foundations that support the empirical results. This analysis 
was initially presented by the first author at SampTA 2007. 

B. Compressive Sampling 

The most direct precedent for this paper is the theory of 
compressive sampling. This field, initiated in the papers [16], 
[37], has shown that random measurements are an efficient and 
practical method for acquiring compressible signals. For the 

'a single n-point FFT requires around 2.5nlog2 n flops. 
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most part, compressive sampling has concentrated on finite- 
length, discrete-time signals. One of the innovations in this 
work is to transport the continuous-time signal acquisition 
problem into a setting where established techniques apply. In- 
deed, our empirical and theoretical estimates for the sampling 
rate of the random demodulator echo established results from 
the compressive sampling literature regarding the number of 
measurements needed to acquire sparse signals. 

C. Comparison with Nonuniform Sampling 

Another line of research [16], [38] in compressive sampling 
has shown that frequency-sparse, periodic, bandlimited signals 
can be acquired by sampling nonuniformly in time at an 
average rate comparable with ([TSj. This type of nonuniform 
sampling can be implemented by changing the clock input to 
a standard ADC. Although the random demodulator system 
involves more components, it has several advantages over 
nonuniform sampling. 

First, nonuniform samplers are extremely sensitive to timing 
jitter. Consider the problem of acquiring a signal with high- 
frequency components by means of nonuniform sampling. 
Since the signal values change rapidly, a small error in the 
sampling time can result in an erroneous sample value. The 
random demodulator, on the other hand, benefits from the 
integrator, which effectively lowers the bandwidth of the input 
into the ADC. Moreover, the random demodulator uses a 
uniform clock, which is more stable to generate. 

Second, the SNR in the measurements from a random 
demodulator is much higher than the SNR in the measurements 
from a nonuniform sampler Suppose that we are acquiring a 
single sinusoid with unit amplitude. On average, each sample 
has magnitude 2~^/^, so if we take W samples at the Nyquist 
rate, the total energy in the samples is W/2. If we take 
R nonuniform samples the total energy will be on average 
R/2. In contrast, if we take R samples with the random 
demodulator, each sample has an approximate magnitude of 
y/W/R, SO the total energy in the samples is about W. In 
consequence, signal recovery using samples from the random 
demodulator is more robust against additive noise. 

D. Relationship with Random Convolution 

As illustrated in Figure [2] the random demodulator can be 
interpreted in the frequency domain as a convolution of a 
sparse signal with a random waveform, followed by lowpass 
filtering. An idea closely related to this — convolution with a 
random waveform followed by subsampling — has appeared in 
the compressed sensing literature [39], [40], [41], [42]. In fact, 
if we replace the integrator with an ideal lowpass filter, so that 
we are in essence taking R consecutive samples of the Fourier 
transform of the demodulated signal, the architecture would be 
very similar to that analyzed in [40], with the roles of time 
and frequency reversed. The main difference is that [40] relies 
on the samples themselves being randomly chosen rather than 
consecutive (this extra randomness allows the same sensing 
architecture to be used with any sparsity basis). 



E. Multiband Sampling Theory 

The classical approach to recovering bandlimited signals 
from time samples is, of course, the well-known method 
associated with the names of Shannon, Nyquist, Whittaker, 
Kotel'nikov, and others. In the 1960s, Landau demonstrated 
that stable reconstruction of a bandlimited signal demands a 
sampling rate no less than the Nyquist rate [4]. Landau also 
considered multiband signals, those bandlimited signals whose 
spectrum is supported on a collection of frequency intervals. 
Roughly speaking, he proved that a sequence of time samples 
cannot stably determine a multiband signal unless the average 
sampling rate exceeds the measure of the occupied part of the 
spectrum [43]. 

In the 1990s, researchers began to consider practical sam- 
pling schemes for acquiring multiband signals. The earliest 
effort is probably a paper of Feng and Bresler, who assumed 
that the band locations were known in advance [44]. See 
also the subsequent work [45]. Afterward, Mishali and Eldar 
showed that related methods can be used to acquire multiband 
signals without knowledge of the band locations [46]. 

Researchers have also considered generalizations of the 
multiband signal model. These approaches model signals as 
arising from a union of subspaces. See, for example, [47], 
[48] and their references. 

F. Parallel Demodulator System 

Very recently, it has been shown that a bank of random 
demodulators can be used for blind acquisition of multiband 
signals [49], [50]. This system uses different parameter settings 
from the system described here, and it processes samples in a 
rather different fashion. This work is more closely connected 
with multiband sampling theory than with compressive sam- 
pling, so we refer the reader to the papers for details. 

G. Finite Rate of Innovation 

Vetterli and collaborators have developed an alternative 
framework for sub-Nyquist sampling and reconstruction, 
called finite rate of innovation sampling, that passes an analog 
signal / having K degrees of freedom per second through a 
linear time-invariant filter and then samples at a rate above 
2K Hz. Reconstruction is then performed by rooting a high- 
order polynomial [51], [52], [53], [54]. While this sampling 
rate is less than the 0{K log{W/ K)) required by the random 
demodulator, a proof of the numerical stability of this method 
remains elusive. 

H. Sublinear FFTs 

During the 1990s and the early years of the present decade, 
a separate strand of work appeared in the literature on theoreti- 
cal computer science. Researchers developed computationally 
efficient algorithms for approximating the Fourier transform 
of a signal, given a small number of random time samples 
from a structured grid. The earliest work in this area was 
due to Kushilevitz and Mansour [55], while the method was 
perfected by Gilbert et al. [56], [57]. See [38] for a tutorial on 
these ideas. These schemes provide an alternative approach to 
sub-Nyquist ADCs [58], [59] in certain settings. 
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Appendix I 
The Random Demodulator Matrix 

This appendix collects some facts about the random demod- 
ulator matrix that we use to prove the recovery bounds. 

A. Notation 

Let us begin with some notation. First, we abbreviate 
IW} = {1,2,..., W}. We write * for the complex conjugate 
transpose of a scalar, vector, or matrix. The symbol ||-|| 
denotes the spectral norm of a matrix, while |l-|lp indicates the 
Frobenius norm. We write IHImax the maximum absolute 
entry of a matrix. Other norms will be introduced as necessary. 
The probability of an event is expressed as P{-}, and we use 
E for the expectation operator. For conditional expectation, 
we use the notation Ex Z, which represents integration with 
respect to X, holding all other variables fixed. For a random 
variable Z, we define its Lp norm 

EP{Z) = (EIZI*")^/^. 

We sometimes omit the parentheses when there is no possibil- 
ity of confusion. Finally, we remind the reader of the analyst's 
convention that roman letters c, C, etc. denote universal 
constants that may change at every appearance. 

B. Background 

This section contains a potpourri of basic results that will 
be helpful to us. 

We begin with a simple technique for bounding the moments 
of a maximum. Consider an arbitrary set {Zi,. . . ,Zn} of 
random variables. It holds that 



where the optimal constant 



EP(maxjZj) < N'^/p maDijW{Zj) 
To check this claim, simply note that 



(14) 



[Emaxj \Zj 



[X;.E|^ir]'^''<[A^-max,-E|Z,-n 



1/p 



In many cases, this inequality yields essentially sharp results 
for the appropriate choice of p. 

The simplest probabilistic object is the Rademacher random 
variable, which takes the two values ±1 with equal likelihood. 
A sequence of independent Rademacher variables is referred to 
as a Rademacher sequence. A Rademacher series in a Banach 
space X is a sum of the form 

^j=i ^'^^ 

where {xj} is a sequence of points in X and {^j} is an 
(independent) Rademacher sequence. 

For Rademacher series with scalar coefficients, the most 
important result is the inequality of Khintchine. The following 
sharp version is due to Haagerup [60]. 

Proposition 3 (Khintchine): Let p > 2. For every sequence 
{ttj} of complex scalars, 

1/2 



p\ 



i/p 



< 2i/2fe-0-5yp. 



_2P/2(p/2)!_ 

This inequaUty is typically estabUshed only for real scalars, 
but the real case implies that the complex case holds with the 

same constant. 

Rademacher series appear as a basic tool for studying 
sums of independent random variables in a Banach space, as 
illustrated in the following proposition [61, Lem. 6.3]. 

Proposition 4 (Symmetrization): Let {Zj} be a finite se- 
quence of independent, zero-mean random variables taking 
values in a Banach space X. Then 



EP 



IE, 



< 2EP 



X 



where is a Rademacher sequence independent of {Zj}. 

In words, the moments of the sum are controlled by the 
moments of the associated Rademacher series. The advantage 
of this approach is that we can condition on the choice of {Zj} 
and apply sophisticated methods to estimate the moments of 
the Rademacher series. 

Finally, we need some facts about symmetrized random 
variables. Suppose that Z is a zero-mean random variable 
that takes values in a Banach space X. We may define the 
symmetrized variable Y = Z—Z', where Z' is an independent 
copy of Z. The tail of the symmetrized variable Y is closely 
related to the tail of Z. Indeed, 

P{||Z|l^>2E||Z||^ + 4<P{||r||^>4. (15) 

This relation follows from [61, Eqn. (6.2)] and the fact that 
Med(y) < 2Ey for every nonnegative random variable Y . 

C. The Random Demodulator Matrix 

We continue with a review of the structure of the random 
demodulator matrix, which is the central object of our affec- 
tion. Throughout the appendices, we assume that the sampling 
rate R divides the bandlimit W. That is, 

W/R e z. 

Recall that the i?x random demodulator matrix * is defined 
via the product 

* = HDF. 

We index the R rows of the matrix with the Roman letter r, 
while we index the W columns with the Greek letters a, w. It 
is convenient to summarize the properties of the three factors. 

The matrix F is a permutation of the. W x W discrete 
Fourier transform matrix. In particular, it is unitary, and each 
of its entries has magnitude W~^/'^. 

The matrix D is a random W x W diagonal matrix. The 
entries of D are the elements ei,...,ew of the chipping 
sequence, which we assume to be a Rademacher sequence. 
Since the nonzero entries of D are ±1, it follows that the 
matrix is unitary. 

The matrix H is an Rx W matrix with 0-1 entries. Each 
of its rows contains a block of W/R contiguous ones, and the 
rows are orthogonal. These facts imply that 



|iJ|| = ^W/R. 



(16) 
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To keep track of the locations of the ones, the following 
notation is useful. We write 



J 



when (r - 1)W/R <j< rW/R. 



Thus, each value of r is associated with W/R values of j. 
The entry hrj = 1 if and only if j ~ r. 

The spectral norm of $ is determined completely by its 
structure. 



*|| = \\HDF\\ = \\H\\ = y/W/R 



(17) 



because F and D are unitary. 

Next, we present notations for the entries and columns of 
the random demodulator For an index u G IWJ, we write ip^ 
for the Loi\\ column of Expanding the matrix product, we 
can express 

V>u, = 2^^^^ejfj^hj (18) 

where fj,^ is the (j, uj) entry of F and hj is the jth column of 
H. Similarly, we write (pruj for the (r, ui) entry of the matrix 
The entry can also be expanded as a sum 



E 



ifj' 



(19) 



Finally, given a set J7 of column indices, we define the 
column submatrix $n of $ whose columns are listed in il. 

D. A Componentwise Bound 

The first key property of the random demodulator matrix is 
that its entries are small with high probability. We apply this 
bound repeatedly in the subsequent arguments. 

Lemma 5: Let $ be an i? x random demodulator matrix. 
When 2 < p < 4 log W , we have 



11*11 < 

1 1 M max — 



&\ogW 



R 



Furthermore, 



' 1 1 max 



\Q\ogW 



R 



< w- 



Proof: As noted in ( [T9] l, each entry of 4> is a Rademacher 
series; 



E 



ifj' 



Observe that 

- ' W ^ R' 

because the entries of F all share the magnitude W^^^"^. 
Khintchine's inequality. Proposition |3] provides a bound on 
the moments of the Rademacher series: 

Cp 



< 



where Cp < 2° '^-'e-°-^ y^. 

We now compute the moments of the maximum entry. Let 

= II* II max = max^,^ \ipr^\. 

Select q = max{p, 41og VF}, and invoke Holder's inequality. 



E'' M < E« max. 



Inequality ([14]) yields 

EPM < (W)i/«max,,„E' |(^™| . 
Apply the moment bound for the individual entries to reach 



R 

Since i? < W and 5 > 41ogW^, we have {RWy/'^ < ^ 
Simplify the constant to discover that 



E^" M < 2 



1.25 



logW 
R 



A numerical bound yields the first conclusion. 

To obtain a probability bound, we apply Markov's inequal- 
ity, which is the standard device. Indeed, 

n9 

'{M > u} = P{A/« > w«} < 



Choose u = c°-^^ E'' M to obtain 
P i M > 2—e 



i.25„o.25. /logVF 



R 



< e 



' log W 



Another numerical bound completes the demonstration. ■ 

E. The Gram Matrix 

Next, we develop some information about the inner products 
between columns of the random demodulator Let a and lo 
be column indices in Using the expression ([TSj for the 

columns, we find that 

where we have abbreviated r/jk — {hk, hj). Since rjjj = 1, 
the sum of the diagonal terms is 



E , fjafji^ — 



1, a = UJ 



'J — I 0, a ^ UJ 
because the columns of F are orthonormal. Therefore, 



SjSkVjkfjafki. 



where 6au is the Kronecker delta. 

The Gram matrix 4>*4> tabulates these inner products. As 
a result, we can express the latter identity as 



where 



'^atj / , . ^i^kVjkfjafkijj- 



It is clear that E X = 0, so that 

E*** = I. 



(20) 



We interpret this relation to mean that, on average, the columns 
of $ form an orthonormal system. This ideal situation cannot 
occur since 4> has more columns than rows. The matrix X 
measures the discrepancy between reality and this impossible 
dream. Indeed, most of argument can be viewed as a collection 
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of norm bounds on X that quantify different aspects of this 
deviation. 

Our central resuh provides a uniform bound on the entries 
of X that holds with high probability. In the sequel, we exploit 
this fact to obtain estimates for several other quantities of 
interest. 

Lemma 6: Suppose that R> 2 log W. Then 



l^ll„.ax>C 



R I - 



Proof: The object of our attention is 

= ll^llmax = max ^ SjekVjkfjafL 
a,uj 1^ — j¥^k 

A random variable of this form is called a second-order 
Rademacher chaos, and there are sophisticated methods avail- 
able for studying its distribution. We use these ideas to develop 
a bound on E'" M for p = 2 log W . Afterward, we apply 
Markov's inequality to obtain the tail bound. 

For technical reasons, we must rewrite the chaos before we 
start making estimates. First, note that we can express the 
chaos in a more symmetric fashion: 



M ■ 



It is also simpler to study the real and imaginary parts of the 
chaos separately. To that end, define 



77,fc- iRe(/,„/*^ + A.„/;^), 
0, 



With this notation, 

M < max \ y ejEkCi^k 

a,uj \^ — ^j;k 

= Mr, + Mi^. 



j ^ k 
j = k 



■ max > EjEkb' 



We focus on the real part since the imaginary part receives an 
identical treatment. 

The next step toward obtaining a tail bound is to decouple 
the chaos. Define 



Y 



max \y 



£j£kajk 



where {e'j} is an independent copy of {sj}. Standard decou- 
pling results [62, Prop. 1.9] imply that 

So it suffices to study the decoupled variable Y. 

To approach this problem, we first bound the moments of 
each random variable that participates in the maximum. Fix a 
pair {a, uj) of frequencies. We omit the superscript a and oj 
to simplify notations. Define the auxiliary random variable 

Z — Zauj — I y ] . ^ £'j£k0.jk 

Construct the matrix A = [ajk\- As we will see, the variation 
of Z is controlled by spectral properties of the matrix A. 



For that purpose, we compute the Frobenius norm and 
spectral norm of A. Each of its entries satisfies 



, , 1 
\ajk\ < ^Vjk 



1 

W 



{H*H)jk 



owing to the fact that the entries of F are uniformly bounded 
by W~^l'^. The structure of H implies that H* H is a 
symmetric, 0-1 matrix with exactly W/R nonzero entries in 
each of its W rows. Therefore, 



\H*H\\ 



1 

W 



W ■ 



w 

~R 



1 



R 



The spectral norm of A is bounded by the spectral norm of 
its entrywise absolute value abs(A). Applying the fact ( fTS] ) 
that \\H\\ = ^W/R, we obtain 



\A\\ < ||abs(A)|| < 



1 1,1 
— \\H*H\\ = — llifll^ ^ -. 

W" " W " " R 

Let us emphasize that the norm bounds are uniform over all 
pairs {a,Lu) of frequencies. Moreover, the matrix B = [bjk] 
satisfies identical bounds. 

To continue, we estimate the mean of the variable Z. This 
calculation is a simple consequence of Holder's inequality: 

21 1/2 



EZ < (EZ^)^/^ = 



J - \^k 



Sk^jk 



1 

r' 



1/2 



1/2 



I^IIf = 



Chaos random variables, such as Z, concentrate sharply 
about their mean. Deviations of Z from its mean are controlled 
by two separate variances. The probability of large deviation 
is determined by 

C^ = sup||„|| 1 V-j Uj^kajk 
while the probability of a moderate deviation depends on 



y = Esup||„||^^i \2_^^^Ujekajk 
To compute the first variance U, observe that 



U : 



|A||<i. 
" - R 



The second variance is not much harder. Using Jensen's 
inequality, we find 

21 1/2 

£k0.jk 



Ej |Efc " 



< 



jk 



1/2 



1 



R 



We are prepared to appeal to the following theorem which 
bounds the moments of a chaos variable [63, Cor 2]. 

Theorem 7 (Moments of chaos): Let Z he a decoupled, 
symmetric, second-order chaos. Then 



EP\Z-EZ\ <K[^V+pU] 



for each p> I. 
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Substituting the values of the expectation and variances, we 
reach 







< K 




P' 













The content of this estimate is clearer, perhaps, if we split the 
bound at p — R. 



\2Kp/R, 



p<R 
p> R. 



In words, the variable Z exhibits subgaussian behavior in the 
moderate deviation regime, but its tail is subexponential. 

We are now prepared to bound the moments of the max- 
imum of the ensemble {Zauj}- When p — 21ogM^ < R, 
inequality ( |T4] i yields 

1 



E^' max 



.UJ 

~P 



1 



< e • 2K. 

Recalling the definitions of Y and Zaui, we reach 
EPy = EfmaxZ„^ < CW-|— . 

a,^ V R 

In words, we have obtained a moment bound for the decoupled 
version of the real chaos Mro- 

To complete the proof, remember that E^ Mro < 4 E^ F. 
Therefore, 

llogW 



Mrc < 4C 



R 



An identical argument establishes that 



llogW 

Since M < Mro + Afim, it follows inexorably that 



E^" M < 8C 



\QgW 



R 



Finally, we invoke Markov's inequality to obtain the tail 
bound 



M > . 8C I < e-°-5p = W~\ 



R 



This endeth the lesson. 



F. Column Norms and Coherence 

Lemma [6] has two important and immediate consequences. 
First, it implies that the columns of $ essentially have unit 
norm. 

Theorem 8 (Column Norms): Suppose the sampling rate 
R > Cd-^logW. 
An R X W random demodulator $ satisfies 



l|2 



> (5| < W-\ 



Lemma |6] also shows that the coherence of the random 
demodulator is small. The coherence, which is denoted by ^, 



bounds the maximum inner product between distinct columns 
of and it has emerged as a fundamental tool for establishing 
the success of compressive sampling recovery algorithms. 
Rigorously, 

H = max|(¥?Q, (p^) \ . 

We have the following probabilistic bound. 

Theorem 9 (Coherence): Suppose the sampling rate 

R > 21ogT^. 

An R X W random demodulator $ satisfies 



logH^ 



R 



< w- 



For a general R x W matrix with unit-norm columns, we 
can verify [64, Thm. 2.3] that its coherence 

R' 
W 



> 



1 



1 



1 



R 



Since the columns of the random demodulator are essentially 
normalized, we conclude that its coherence is nearly as small 
as possible. 

Appendix II 
Recovery for the Random Phase Model 

In this appendix, we establish Theorem [T] which shows that 
£i minimization is likely to recover a random signal drawn 



from Model (A). Appendix III develops results for general 
signals. 

The performance of £i minimization for random signals de- 
pends on several subtle properties of the demodulator matrix. 
We must discuss these ideas in some detail. In the next two 
sections, we use the bounds from Appendix [I] to check that the 
required conditions are in force. Afterward, we proceed with 
the demonstration of Theorem [T] 

A. Cumulative Coherence 

The coherence measures only the inner product between 
a single pair of columns. To develop accurate results on the 
performance of £i minimization, we need to understand the 
total correlation between a fixed column and a collection of 
distinct columns. 

Let ^ he m RxW matrix, and let be a subset of . 
The local 2-cumulative coherence of the set Q. with respect to 
the matrix $ is defined as 

9l 1/2 

^2(f^) = max > ^^KVa, ^J) 



The coherence provides an immediate bound on the cumu- 
lative coherence: 

ll2(P) < l-iVWl 

Unfortunately, this bound is completely inadequate. 

To develop a better estimate, we instate some additional 
notation. Consider the matrix norm ||-||]^^2' which returns 
the maximum £2 norm of a column. Define the hollow Gram 
matrix 

G = — diag(^>*€>), 
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which tabulates the inner products between distinct columns 
of Let Rq be the W x W orthogonal projector onto the 
coordinates listed in 17. Elaborating on these definitions, we 
see that 



^l2m = \\RnGiI-Rf. 



< WRnGW 



(21) 



In particular, the cumulative coherence fJ,2{^) is dominated by 
the maximum column norm of G. 

When il is chosen at random, it turns out that we can use 
this upper bound to improve our estimate of the cumulative 
coherence ii2{^)- To incorporate this effect, we need the 
following result, which is a consequence of Theorem 3.2 of 
[65] combined with a standard decoupling argument (e.g., see 
[66, Lem. 14]). 

Proposition 10: Fix a W x W matrix G. Let R be an 
orthogonal projector onto K coordinates, chosen randomly 
from iWj. Forp = 21ogiy, 



EP\\RG\\,^^<8^logW\\G\\ 



\G\ 



1^2 ■ 



With these facts at hand, we can establish the following 
bound. 

Theorem 11 (Cumulative Coherence): Suppose the sam- 
pling rate 

R>C [K\ogW + \oi' W] . 

Draw an Rx W random demodulator 4>. Let il be a random 
set of K coordinates in IW}. Then 

1 



< 3W- 



x/161ogTy, 

Proof: Under our hypothesis on the sampling rate. 
Theorem [9] demonstrates that, except with probability W^^, 



\G\\ 



< 



logVF' 

where we can make c as small as we like by increasing the 
constant in the sampling rate. Similarly, Theorem [8] ensures 
that 



<2, 



except with probability W^^. We condition on the event F 
that these two bounds hold. We have P {F''} < 2M^~^ 
On account of the latter inequality and the fact ^VT) that 

11*11 = ^JWJR, we obtain 



IGII 



max 



^2 = max 11***6,^11 2 



< 11*1 



max 1 1 



< 2^JW|R. 



We have written for the wth standard basis vector in C'^. 

Let i?o be the (random) projector onto the K coordinates 
listed in fi. For -p = 2 log W , relation pT| ) and Proposition 10 
yield 



W{ii2{p)\F\<WmnG\\^^^ \ F\ 
< 8v/logW- 



logW 



■ 2\ 



< 



VbgW 



under our assumption on the sampling rate. Finally, we apply 
Markov's inequality to obtain 



/i2(r2) > 



VbgW 



F [ < e~"-^P = W-K 



By selecting the constant in the sampling rate sufficiently 
large, we can ensure c' is sufficiently small that 



1 



VI 6 log 



F )■ < 



We reach the conclusion 

^ {^^(") > vfmw] - + ^^^^^ - 

when we remove the conditioning. ■ 

B. Conditioning of a Random Submatrix 

We also require information about the conditioning of a 
random set of columns drawn from the random demodulator 
matrix *. To obtain this intelligence, we refer to the following 
result, which is a consequence of [65, Thm. 1] and a decou- 
pling argument. 

Proposition 12: Let Ahe aW xW Hermitian matrix, split 
into its diagonal and off-diagonal parts: A = E + G. Draw an 
orthogonal projector R onto K coordinates, chosen randomly 
from iWj. Forp = 21ogW^, 



EP WRARW < C 



log 14^11 All 



log K 
^ M|L , + — ||A| 

Suppose that il is a subset of fW}. Recall that *s7 is the 
column submatrix of * indexed by il. We have the following 
result. 

Theorem 13 (Conditioning of a Random Submatrix): 
Suppose the sampling rate 

R>C [/iTlogiy + log^ W] . 

Draw an RxW random demodulator, and let be a random 
subset of IW} with cardinality K. Then 

P{||*s*2*o-I|| >0.5}<3W-\ 
Proof: Define the quantity of interest 

Q^\\^h^n-I\\. 

Let Rq be the random orthogonal projector onto the coordi- 
nates listed in il, and observe that 

Q= ||fio(***-I)iln||. 

Define the matrix A = *** — !. Let us perform some 
background investigations so that we are fully prepared to 



apply Proposition 12 



We split the matrix 



A = E + G, 
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where E — diag($*$)— I and G = **f>-diag($*$) is the 
hollow Gram matrix. Under our hypothesis on the sampling 
rate, Theorem [8] provides that 

||2 



\E\\ = max 



1^112 



- 1 



< 0.15, 



except with probability W ^. It also follows that 

||diag(***)|| ^m&x\\ipJl < 1.15. 

We can bound |lG||i^2 repeating our earlier calculation 
and introducing the identity ([TTJi that ||4>|| = ^yW/R. Thus, 



IGII 



< 11*11 max \\ipuj\\2 < 



l.lbW 
R ■ 



Since W/R > 1, it also holds that 

|1G|| = II*** - diag(***)|| < 11*1 
Meanwhile, Theorem |9] ensures that 



1.15 < 



2.15T4^ 



R 



IGII 



< 



logW 

except with probability W~^. As before, we can make c as 
small as we like by increasing the constant in the sampling 
rate. We condition on the event F that these estimates are in 
force. So V {F''} < 2W-^. 



Now, invoke Proposition 12 to obtain 

E^[g|F]<c[logiy.^ 



inogl^ 1.15W K \.\bW 
W \ R ' R 

for p = 2 log W . Simplifying this bound, we reach 



0.15 



¥F[Q\F] < C 



C'KlogW 
R 



0.15. 



By choosing the constant in the sampling rate R sufficiently 
large, we can guarantee that 

KP[Q\F] < 0.3. 

Markov's inequality now provides 

P {g > 0.3e°-5 1 F} < e-°-^P = W'K 

Note that 0.3e° ''' < 0.5 to conclude that 

P{Q > 0.5} < W^^ + PiF"} < SW'K 

This is the advertised conclusion. ■ 

C. Recovery of Random Signals 

The literature contains powerful results on the performance 
of ii minimization for recovery of random signals that are 
sparse with respect to an incoherent dictionary. We have finally 
acquired the keys we need to start this machinery. Our major 
result for random signals. Theorem [T] is a consequence of the 
following result, which is an adaptation of [67, Thm. 14]. 

Proposition 14: Let * be an i? x W matrix, and let Q. be 
a subset of |W] for which 



• /i2(f^) < (161ogiy)-i/2, and 
. ||(*^,*o)-i <2. 

Suppose that s is a vector that satisfies 

• supp(s) C ri, and 

• sgn {s^) are i.i.d. uniform on the complex unit circle. 
Then s is the unique solution to the optimization problem 



subject to *?; = *s 



(22) 



except with probability 2W^^. 

Our main result. Theorem [T] is a corollary of this result. 
Corollary 15: Suppose that the sampling rate satisfies 

R>C [inogTy + log^ W] . 

Draw an R X W random demodulator *. Let s be a random 
amplitude vector drawn according to Model (A). Then the 
solution s to the convex program (|22]) equals s except with 
probability 8W~^. 

Proof: Since the signal s is drawn from Model (A), its 
support f2 is a uniformly random set of K components from 
IWJ. Theorem 11 ensures that 

VlDiogly 

except with probability 3W^^. Likewise, Theorem 13 
antees 



guar- 



l*^i*o-I|| 



< 0.5, 



except with probability 3W ^. This bound implies that all the 
eigenvalues of *j2*o lie in the range [0.5, 1.5]. In particular, 

||(*,*,*o)-l <2. 

Meanwhile, Model (A) provides that the phases of the nonzero 
entries of s are uniformly distributed on the complex unit 



circle. We invoke Proposition 14 to see that the optimization 
problem ( p2| ) recovers the amplitude vector s from the obser- 
vations *s, except with probability 2W~^. The total failure 
probability is 81^"^ ■ 

Appendix III 
Stable Recovery 

In this appendix, we establish much stronger recovery 
results under a slightly stronger assumption on the sampling 
rate. This work is based on the restricted isometry property 
(RIP), which enjoys the privilege of a detailed theory. 

A. Background 

The RIP is a formaUzation of the statement that the sampling 
matrix preserves the norm of all sparse vectors up to a 
small constant factor. Geometrically, this property ensures that 
the sampling matrix embeds the set of sparse vectors in a 
high-dimensional space into the lower-dimensional space of 
samples. Consequently, it becomes possible to recover sparse 
vectors from a small number of samples. Moreover, as we will 
see, this property also allows us to approximate compressible 
vectors, given relatively few samples. 

Let us shift to rigorous discussion. We say that an i? x 
matrix * has the RIP of order N with restricted isometry 
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constant Sn G (0,1) when the following inequalities are in 
force: 



< S 



N 



whenever 



\x\\a<N. 



(23) 



Recall that the function 



counts the number of nonzero 



entries in a vector. In words, the definition states that the sam- 
pling operator produces a small relative change in the squared 
£2 norm of an A^-sparse vector. The RIP was introduced by 
Candes and Tao in an important paper [25]. 

For our purposes, it is more convenient to rephrase the RIR 
Observe that the inequalities (|23]l hold if and only if 



a;*(*** I)a; 



< 6 



N 



whenever 



mo<N. 



The extreme value of this Rayleigh quotient is clearly the 
largest magnitude eigenvalue of any NxN principal submatrix 
of 

Let us construct a norm that packages up this quantity. 
Define 



II A III — sup 

|n|<w 



IOxf2 



In words, the triple-bar norm returns the least upper bound on 
the spectral norm of any NxN principal submatrix of A. 
Therefore, the matrix $ has the RIP of order N with constant 
(5 AT if and only if 

III*** -III < Sn. 

Referring back to ( pOj i, we may write this relation in the more 
suggestive form 

III*** -E***||| < Sn- 
In the next section, we strive to achieve this estimate. 

B. RIP for Random Demodulator 

Our major result is that the random demodulator matrix 
has the restricted isometry property when the sampling rate is 
chosen appropriately. 

Theorem 16 (RIP for Random Demodulator): Fix 6 > 0. 
Suppose that the sampling rate 

R>CS'^-Nlog^{W). 

Then an R x W random demodulator matrix * has the 
restricted isometry property of order N with constant Sn < S, 
except with probability 0{W~^). 

There are three basic reasons that the random demodulator 
matrix satisfies the RIP. First, its Gram matrix averages to the 
identity. Second, its rows are independent. Third, its entries are 
uniformly bounded. The proof shows how these three factors 
interact to produce the conclusion. 

In the sequel, it is convenient to write z^z for the rank-one 
matrix zz*. We also number constants for clarity. 

Most of the difficulty of argument is hidden in a lemma due 
to Rudelson and Vershynin [26, Lem. 3.6]. 

Lemma 17 (Rudelson-Vershynin): Suppose that {Zr} is a 
sequence of R vectors in where R < W, and assume 



that each vector satisfies the bound H^rlloo — ^- Let {£,r} be 
an independent Rademacher series. Then 

1/2 



< 



where 



f3 < CiBVNlog^W) 



The next lemma employs this bound to show that, in 
expectation, the random demodulator has the RIP. 

Lemma 18: Fix S E (0, 1). Suppose that the sampling rate 

R>C2S~'^-Nlog^W. 

Let * be an i? X random demodulator. Then 

E III*** -III < S. 

Proof: Let z* denote the rth row of *. Observe that 
the rows are mutually independent random vectors, although 
their distributions differ. Recall that the entries of each row are 
uniformly bounded, owing to Lemma [5] We may now express 
the Gram matrix of * as 

ER 
Zj. (g) Zj.. 
r—1 

We are interested in bounding the quantity 



£; = E||***-I|| =E 

= E 



^ {Zr ® Zr - z'^ ® z'^) 



where {2;'.} is an independent copy of {z^}. 
The symmetrization result. Proposition [4j yields 

E < 2E 1^ ^rZr (8) Zr 

where {^^1 is a Rademacher series, independent of every- 
thing else. Conditional on the choice of {zr}, the Rudelson- 
Vershynin lemma results in 

E <2E 1/3- «) Zr 

where 



and B = max^ 

According to Lemma [5] 



P < CiBVNlog^W 
(Pku:\ is a random variable. 



We may now apply the Cauchy-Schwarz inequahty to our 
bound on E to reach 



E < 2V6C1I 



iNlog-'W 



R 



(e|||E.^- 



1/2 



Add and subtract an identity matrix inside the norm, and 
invoke the triangle inequality. Note that |||I||| = 1, and identify 
a copy of E on the right-hand side: 



E < 



^2AClNlo^ 




R 


'24C?iVlof 





R 



E V z 

III ^-^r 

{E+lY'\ 



I 



1/2 
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Solutions to the relation E < ^{E 
whenever 7 < 1. 
We conclude that 



1)1/2 Q^gy E < 



E < 



R 



whenever the fraction is smaller than one. To guarantee that 
E < S, then, it suffices that 

R>G2S-^-Nlog^W. 

This point completes the argument. ■ 
To finish proving the theorem, it remains to develop a large 
deviation bound. Our method is the same as that of Rudelson 
and Vershynin: We invoke a concentration inequality for sums 
of independent random variables in a Banach space. See [26, 
Thm. 3.8], which follows from [61, Thm. 6.17]. 

Proposition 19: Let li , . . . , yR be independent, symmetric 
random variables in a Banach space X, and assume each 
random variable satisfies the bound ||l^r||x — ^ almost surely. 



r\lx 



Then 



P{y > C3 [uEF + < e"" +e-* 

for all u,t > 1. 

In words, the norm of a sum of independent, symmetric, 
bounded random variables has a tail bound consisting of two 
parts. The norm exhibits subgaussian decay with "standard de- 
viation" comparable to its mean, and it exhibits subexponential 
decay controlled by the size of the summands. 

Proof: [Theorem [16) We seek a tail bound for the random 
variable 



Z = 



- IE 



E4 «> O 



As before, z* is the rth row of $ and z'^. is an independent 
copy of Zr- 

To begin, we express the constant in the stated sampling 
rate as C = C2 • c"^, where c will be adjusted at the end of 
the argument. Therefore, the sampling rate may be written as 

-2 



i?> C2 



cS 



Lemma [T8| implies that 

EZ < 



cS 



< 0(5. 



As described in the background section, we can produce a 
tail bound for Z by studying the symmetrized random variable 



Y = 



E 



{Zr (g) , 



z'^ (g) z'j.) 



where {^^1 is a Rademacher sequence independent of every- 
thing. For future reference, use the first representation of Y to 
see that 

2cS 

EY < 2EZ < , 

where the inequality follows from the triangle inequality and 
identical distribution. 



Observe that Y is the norm of a sum of independent, 
symmetric random variables in a Banach space. The tail bound 



of Proposition 19 also requires the summands to be bounded 
in norm. To that end, we must invoke a truncation argument. 
Define the (independent) events 

10 log H/' 



Fr - |max|||2;^||^, 11411^} 



< 



R 



and 



In other terms, F is the event that two independent random 
demodulator matrices both have bounded entries. Therefore, 

after two applications of Lemma |5] 

Under the event F, we have a hard bound on the triple-norm 
of each summand in Y. 

B = maxr \Zr ® Zr — z'^® z'r\ 

< ma,Xr {\lzr (g) 2:^11 + |||4 ® 4I||) • 

For each r, we may compute 



IN. 



= sup 

|f2|<Ar 

sup I 

|n|<Af 



{Zr (g) Zr) 



2 ^ lOiVlogPF 

I 00 — 



R 



where the last inequality follows from F. An identical estimate 
holds for the terms involving 2;'.. Therefore, 

20N log W 



B < 



R 



Our hypothesized lower bound for the sampling rate R gives 



B < 20iVlogM^ 



< 



c6 



C2iVlog^M^ - logM^' 

where the second inequality certainly holds if c is a sufficiently 
small constant. 

Now, define the truncated variable 



4 «)4)i/ 



By construction, the triple-norm of each summand is bounded 
by B. Since Y and Ftrunc coincide on the event F, we have 

¥{Y > v} <F{Y > V \ F} - FiFj + F {F"} 

= P {>^trunc >v \ F}-F{F} + F {F^} 

-Pmrunc >«}+P{F"} 

for each v > 0. According to the contraction principle [61, 
Thm. 4.4], the bound 

E« Ftrunc < y 

holds pointwise for each choice of {zr} and {2^}- Therefore, 

EFtrunc = E,^,,;, E^ Ftrunc < E,„,,;, E^Y^EY. 

RecalUng our estimate for E Y, we see that 

2cS 



EK,-„ 



< 
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We now apply Proposition [19] to the symmetric variable 
^trunc- The bound reads 



rune r 



tB)}< 



Select u — y/log W and t — log W, and recall the bounds on 
Eltrunc and on B to obtain 

P{>"trunc > C3(2C(5 + C(5)} < 2W-\ 

Using the relationship between the tails of Y and ytiunc, we 
reach 

P {r > -SC^CS} < P {Ftrunc > 3C3CJ} + P {F^} 

Finally, the tail bound for Y yields a tail bound for Z via 
relation ([TSj: 

¥{Z >2EZ + u} <2¥{Y > u} . 

As noted, E Z < cS < C^cS. Therefore, 

P{Z> 5G3cd} <P{Z >2EZ + SCacJ} 

< 2P{r > 3C3c5} 

< 8W~\ 

To complete the proof, we select c < (SCa)^^. ■ 

C. Signal Recovery under the RIP 

When the sampling matrix has the restricted isometry prop- 
erty, the samples contain enough information to approximate 
general signals extremely well. Candes, Romberg, and Tao 
have shown that signal recovery can be performed by solv- 
ing a convex optimization problem. This approach produces 
exceptional results in theory and in practice [24]. 

Proposition 20: Suppose that ^ is an R x W matrix that 
verifies the RIP of order 2K with restricted isometry constant 
S2K < c- Then the following statement holds. Consider an 
arbitrary vector s in C^, and suppose we collect noisy 
samples 

y = + 1/ 

where \\v>\\2 < V- Every solution s to the optimization problem 
min||t;||-^ subject to \\^v — yW^ < f], (24) 
approximates the target signal: 

1 „ 



s - 



s\\2 < C max < 77 



s 



sk 



where sk is a best K-spaise approximation to s with respect 
to the £1 norm. 

An equivalent guarantee holds when the approximation s is 
computed using the CoSaMP algorithm [21, Thm. A]. 

Combining Proposition |20] with Theorem [T6j we obtain our 
major result. Theorem [2] 

Corollary 21: Suppose that the sampling rate 

R>CKlog^W. 

An R X W random demodulator matrix verifies the RIP of 
order 2K with constant S2K < c, except with probability 
0{W~^). Thus, the conclusions of Proposition 20 are in force. 
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